Cosmic microwave background constraints on cosmological models 
with large-scale isotropy breaking 
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Several anomalies appear to be present in the large-angle cosmic microwave background (CMB) 
anisotropy maps of WMAP, including the alignment of large-scale multipoles. Models in which 
isotropy is spontaneously broken (e.g., by a scalar field) have been proposed as explanations for 
these anomalies, as have models in which a preferred direction is imposed during inflation. We 
examine models inspired by these, in which isotropy is broken by a multiplicative factor with dipole 
and/or quadrupole terms. We evaluate the evidence provided by the multipole alignment using a 
Bayesian framework, finding that the evidence in favor of the model is generally weak. We also 
compute approximate changes in estimated cosmological parameters in the broken-isotropy models. 
Only the overall normalization of the power spectrum is modified significantly. 

PACS numbers: 98.80.-k,98.70.Vc, 98.80.Es, 95.85.Bh 



I. INTRODUCTION 

Our understanding of cosmology has advanced ex- 
tremely rapidly in the past decade. These advances 
are due in large part to observations of cosmic mi- 
crowave background (CMB) anisotropy, particularly the 
data from the Wilkinson Microwave Anisotropy Probe 
(WMAP) [1-4]. As a result of these and other obser- 
vations, a "standard model" of cosmology has emerged, 
consisting of a Universe dominated by dark energy and 
cold dark matter, with a nearly scale-invariant spectrum 
of Gaussian adiabatic perturbations p, [6| of the sort that 
would naturally be produced in an inflationary epoch. 

The overall consistency of the CMB data with this 
model is quite remarkable. In particular, the CMB obser- 
vations are very nearly Gaussian, and the angular power 
spectrum matches theoretical models very well from 
scales of tens of degrees down to arcminutes. However, 
several anomalies have been noted on the largest angular 
scales, including a lack of large-scale power 00,01, align- 
ment of low-order multipoles [8l-lllj. and hemispheric 
asymmetries [l2l - [l4"j ]. Some anomalies seem to be asso- 
ciated with the ecliptic plane, suggesting the possibility 
of a systematic error associated with the WMAP scan 
pattern, perhaps related to coup ling; of the scan pattern 
with the asymmetric beam [15]. If the anomalies have 
cosmological significance, then naturally the correlation 
with the ecliptic plane must be a coincidence. 

The significance of and explanations for these puzzles 
are hotly debated. In particular, it is difficult to know 
how to interpret a posteriori statistical significances: 
when a statistic is invented to quantify an anomaly that 
has already been noticed, the low p- values for that statis- 
tic cannot be taken at face value. 

One can (and from a formal statistical point of view, 
arguably one must) dismiss this entire subject on the 



ground that all such anomalies are characterized only by 
invalid a posteriori statistics [l6j|. Nonetheless, the num- 
ber and nature of the anomalies (in particular, the fact 
that several seem to pick out the same directions on the 
sky) seem to suggest that there may be something to 
explain in the data. Given the potential importance of 
new discoveries about the Universe's largest observable 
scales, and the difficulty in obtaining a new data set that 
would allow for a priori statistical analysis, we believe 
that the potential anomalies are worth further examina- 
tion. In this paper, we will tentatively assume that there 
is a need for an explanation and consider what that ex- 
planation might be. 

One of the most robust of the large-scale anomalies 
found in WMAP is a lack of large-scale power, as quanti- 
fied either by the low quadrupole or the vanishing of the 
two-point correlation function at large angles 0, 0, H|. 
If this anomaly is real, then it provides strong evidence 
against a broad class of nonstandard models. To be spe- 
cific, all models in which a statistically independent con- 
taminant (whether due to a foreground, systematic error, 
or exotic cosmology) is added to the data will necessar- 
ily fare worse than the standard model in explaining this 
anomaly [l?], [HI- There is a simple reason for this: a 
statistically independent additive contaminant always in- 
creases the root-mean-square power in any given mode, 
reducing the probability of finding low power. 

It is natural, therefore, to seek an explanation of the 
anomalies among models that do not involve a mere ad- 
ditive contaminant. One simple phenomenological model 
is a multiplicative contaminant, in which the original sta- 
tistically isotropic CMB signal T^°'(6, </>) is modulated by 
a multiplicative factor, leading to an observed signal 



T(9,cf>)=f(9,c!>)TW(d,ct>). 



(1) 
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This model arises naturally in the framework of sponta- 
neous isotropy breaking by a scalar field pjj- Moreover, 
models based on the existence of a vector field specify- 
ing a preferred direction during inflation [l9l . [20| pro- 
duce similar modulation, but with / having specifically 
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a quadrupolar form. To be precise, the modulation in 
these models takes place in the primordial power spec- 
trum P(k), which acquires a quadrupolar dependence on 
the direction of k. The full effect on the CMB anisotropy 
is more complicated than the above model, but the dom- 
inant effect on large scales is, at least approximately, a 
quadrupolar moduloation of the above form. 1 

Since our goal is to explain the observed large-scale 
anomalies while maintaining the success of the standard 
model on smaller scales, it is natural to consider mod- 
els in which / has power only on large scales. We will 
consider three classes of model: one in which / has only 
monopole and dipole terms, one in which it has monopole 
and quadrupole, and one in which it has all three. We will 
refer to these as the dipole-only, quadrupole-only, and 
dipole-quadrupole models. The quadrupole-only model 
is inspired by the theory of a preferred direction dur- 
ing inflation, while the others are inspired by the general 
isotropy-breaking framework. 

This paper addresses the following central question. 
Do the broken-isotropy models provide an explanation 
for one of the main observed anomalies, namely the sur- 
prising alignment between the quadrupole and octupole 
(multipoles I = 2,3)? To examine this question, we 
choose statistics to quantify the anomaly and use these 
statistics to assess goodness of fit of the data to the dif- 
ferent models. Several different statistics are chosen in 
order to assess the robustness of the results. 

Because the statistics are most naturally computed in 
spherical harmonic space, we use the all-sky internal lin- 
ear combination (ILC) maps from the five-year WMAP 
data release (22[. There is bound to be residual fore- 
ground contamination in the ILC maps [13, HiJ . Section 
IVII contains a brief discussion of the effects of this con- 
tamination. 

Naturally, because the anisotropic models have more 
free parameters than the standard model (and indeed 
include the standard model as a special case), there 
will generically be parameter choices that make the 
anisotropic model fit the data better. We adopt the 
Bayesian evidence criterion to assess whether this im- 
proved fit is sufficient to justify the additional complex- 
ity of the anisotropic model. Bayesian evidence has been 
used in addressing this sort of question in the past [25l — 
\2l\ . Although some controversy has arisen over its use 
in cosmology (e.g., (2814311 ] ) . in this context it is both a 
simple and a natural criterion to adopt. 

In some cases, the Bayesian evidence ratios are greater 
than one, meaning that one's assessment of the probabil- 
ity that the broken-isotropy models are true should rise 
as a result of the CMB anomalies. However, in all cases, 
the improvement is modest, providing at most weak sup- 



1 The stability of the specific model of ref. |19j| has been questioned 
\'2l\ : nonetheless, we believe it is worthwhile to consider models 
of this general class. 



port for the adoption of the anisotropic models. 

We also consider the changes in parameter estimates 
that would arise if the anisotropic models are correct. 
To be specific, because we assume that the modulation 
is a perturbation to the standard model, we assume that 
the unmodulated temperature map is derived from 
the cosmological parameters in the usual way - i.e., its 
power spectrum is given by CMBFAST (32[. If there is 
a nonconstant modulation function /, then parameter 
estimates from based on the observed data will naturally 
differ from the true values. We estimate the resulting 
parameter shifts, finding them to be minor. 

The remainder of this paper is structured as follows. 
Section [TT] specifies precisely the anisotropic models un- 
der consideration and describes how we simulate these 
models. In Section Mil we review the method for com- 
puting Bayesian evidence ratios. Section HVl contains our 
main results, indicating the degree to which the multi- 
pole alignment, quantified in several different ways, favor 
the broken-isotropy models. In Section [V] we quantify 
the degree to which best-fit cosmological parameters are 
modified by changing from the standard model to the 
broken-isotropy models. Section IVII discusses some as- 
pects of the issue of foreground contamination. Finally, 
we provide a brief discussion of our results in Section Ivm 

II. SIMULATING ANISOTROPIC MODELS 

The statistical properties of a CMB map are most eas- 
ily expressed in terms of the spherical harmonic expan- 
sion, 

oo I 

T(6,<i>) = J2 E a imYi m (0,<f>). (2) 
1=0 m=-l 

The monopole (I = 0) term in the sum is simply the 
average temperature over the sky, and the dipole (1 = 1) 
terms cannot be separated from the kinematic dipole due 
to our motion with respect to the CMB "rest" frame. 
These terms are typically removed from the data, so that 
in practice the sum starts at I = 2. For compactness, we 
will generally abbreviate such double sums as J2i m > n °t 
writing the limits explicitly unless confusion may arise. 

In the standard model, the CMB map T^°'(6,4>) is a 
realization of a statistically isotropic Gaussian random 
process. This means that the spherical harmonic coeffi- 
cients a^l of this map are independent Gaussian random 
variables with mean zero and variances that depend only 
on I: 

<l«l°il 2 > = cf\ (3) 

where (•) denotes an ensemble average and C; is the 
power spectrum. 

In broken-isotropy models, on the other hand, we as- 
sume that the observed field is related to the above statis- 
tically isotropic expression according to equation ([I} . We 



3 



expand the modulation function in spherical harmonics, 

/(M) = i + X)/b»iWM)- (4) 



l.m 



We assume that the modulation function is normalized 
to have mean one, so that the above sum starts at / = 
1. (Equivalently, we could omit the 1+ in the above 
expression and start the sum at I = with /oo = V^k.) 
We will assume that the coefficients fi m are independent 
Gaussian random variables with a power spectrum 



CP = (\flm\ 2 ). 



(5) 



As noted in the Introduction, we consider models in 
which / has only dipole and/or quadrupole terms. We 
parameterize these terms with parameters a\ , a%, giving 
the rms values of /; m relative to a scale-invariant spec- 



trum C\ J! oc [1(1 



a{ = 2C\ 



(J) 



(/) 



(6) 



Because the spherical harmonics have root-mean-square 
(rms) value (47r) -1 / 2 , these modulations have rms am- 
plitudes (%tt)- 1/2 oi = 0.20cti and (2in)- 1 / 2 a 2 = 0.12ct 2 
respectively. 

The spherical harmonic coefficients of the observed sky 
are found as usual by spherical harmonic orthonormality: 



dQT(e^)Y; m 



(7) 



dfi/(0,0)r<°>(M)y^(M). (8) 



Expanding the functions and / in spherical harmon- 
ics, we find that 

aim = o,[ i l ni fi 2m2 Ii imi i 2m2 i m , (9) 

where Ii imi i 2 m 2 im represents an integral over three spher- 
ical harmonics, which ca be expressed in terms of Wigner 
3-j symbols 



(21 + 1) (2l x + 1) (2l 2 + 1) 



■in 



I h 




I h 
-m mi 



h 
m 2 



(10) 



(11) 



The quadruple sum in equation ^ has very few 
nonzero terms and hence can be quickly evaluated. Be- 
cause our model includes only low-Z power in /, the sum 
over l 2 ranges from to at most 2. Moreover, the Wigner 
3-j symbols vanish unless certain conditions are satisfied. 
First, (l,h,l 2 ) must satisfy a triangle inequality, so that 
the sum over l\ ranges from / — 2 (or zero, whichever is 
greater) to 1 + 2. Second, the first of the two 3-j symbols 
vanishes unless l + li + l 2 is even. Finally, the constraint 
m = mi + m 2 must be satisfied. 



III. BAYESIAN EVIDENCE 

Our goal in this paper will be to compare the standard 
model (the null hypothesis) with the class of broken- 
isotropy models. Naturally, because the latter class is 
broader, and indeed includes the null hypothesis as a 
limiting case, there will generically be members of the 
class that fit the data better than the standard model. 
The Bayesian evidence provides a framework for assess- 
ing whether the better fit found in the more complicated 
model is worth the Occam's-razor "cost." We now briefly 
review this approach to model comparison. 

Suppose that we have a model M that depends on a 
set of parameters 9. Given a data set D, we define the 
evidence of the model to be the probability density of D, 
given the model M: 



E(M) = J d8P(D\M,0)n(8\M). 



(12) 



In this expression, P is the likelihood function - that 
is, the probability density for the data given the choice 
of model and parameters - and 7r is the prior probabil- 
ity density of the model parameters. It may be helpful 
to keep track of dimensions in these expressions. The 
prior 7r has dimensions of probability per unit volume 
in parameter space, while P and E have dimensions of 
probability per unit volume in data space. 

Bayes's theorem says that the posterior probability of 
the model is proportional to the product of the model's 
prior probability and the evidence. Suppose now that we 
have two models M%, M 2 in mind, and imagine that, be- 
fore looking at the data set D, we regarded these models 
as equally probable. Then the evidence ratio 



A 



~ E(M 2 ) 



(13) 



is equal to the ratio of posterior probabilities. 

In the case we consider in this paper, the two models 
are the standard model and the broken-isotropy model. 
The reader (like the writers) probably does not assign 
equal prior probabilities to these two models: in the 
absence of the WMAP anomalies, most of us probably 
thought that the broken-isotropy model was less likely 
Even in this case, the evidence ratio still tells us by what 
factor the broken-isotropy model goes up in our estima- 
tion (relative to the isotropic model) as a result of the 
data. 

The Bayesian evidence automatically accounts for the 
degree of complexity of the model, in the sense that mod- 
els with a large parameter space will be automatically 
downweighted compared to those with a small parame- 
ter space. To see this heuristically, suppose that the prior 
probability n is approximately flat over some volume V p 
in parameter space that is much larger than the range 
over which the likelihood function is large. Then since 
the probability distribution is normalized, 



J d8n(8\M) = 1, 



(14) 
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we can estimate ir ~ V p 1 over the range where the inte- 
grand is significant. Thus we can crudely estimate 

E(M) ~ V- 1 f dOP(D\M,0) ~ Pm »* VL , (15) 

where P m ax is the peak of the likelihood function and 
Vl is an estimate of the volume in parameter space over 
which the likelihood differs significantly from zero. If 
we consider two models with similarly good fits to the 
data (i.e., similar values of P max ), the one with a higher 
value of the ratio V p /Vl will have the higher value of the 
evidence. In other words, the Bayesian evidence disfa- 
vors models with a large volume of "wasted" parameter 
space. When comparing models with parameter spaces of 
different dimensions, the one with a higher-dimensional 
parameter space will typically be disfavored, unless it 
provides a much better fit to the data (i.e., has a large 
Pmax) or it provides a reasonably good fit to the data 
over most of the parameter space. 

Below, we will use Bayesian evidence ratios to assess 
whether the multipole alignment anomaly significantly 
favors the adoption of the more complicated broken- 
isotropy models, using the following procedure. We de- 
fine a statistic s that describes the anomaly. Since the 
null (statistically isotropic) hypothesis Mq has no free 
parameters, the evidence for it is simply the probability 
density of the statistic under that hypothesis: 

E = P(s\M ). (16) 

For some choices of statistic, this probability density can 
be computed analytically, but in general it must be esti- 
mated from simulations. 

We now consider the evidence E\ = E(M\) for the 
broken-isotropy model. Let us first examine the mod- 
els in which / has only power in one multipole (i.e., the 
dipole-only and quadrupole-only models). The parame- 
ter space 6 for this model consists of the single parame- 
ter cTj, where j = 1 for the dipole-only model and 2 for 
the quadrupole-only model. To compute the evidence for 
this model, we must choose a prior n(<Tj). We adopt a 
uniform prior on some range <Tj £ [0,<7 max ]: 

n ( aj ) = r™* 0<a j <a ^ (i 7 ) 

I otherwise. 

For the dipole-quadrupole model, we follow a similar pro- 
cedure, adopting a prior on 6 = (01,(72) of 

'^^ = [ otherwise. (18) 

Since it is not obvious what cutoff cr max to choose, we 
plot the evidence ratio Ei/Eq as a function of this pa- 
rameter. We regard the maximum value of the evidence 
ratio as an upper bound on the true evidence ratio. From 
the heuristic argument above we expect the evidence ra- 
tio to decline for very large values of <7 max , since these 
models presumably have "wasted" parameter space. 



IV. RESULTS 

Various statistics have been used in the past to char- 
acterize the observed alignment of the I = 2 and I = 3 
multipoles in the WMAP data. We focus on two cat- 
egories of statistic: one based on finding the directions 
that maximize the angular momentum [8j for each multi- 
pole^ Section |IV3}, and one based on multipole vectors 
[ID]! (Section HE© . 

A. Angular momentum 

For any given multipole Z, consider the map obtained 
by keeping just the corresonding coefficients in the spher- 
ical harmonic expansion, 

l 

T,(M) = aimYi m {0,<p). (19) 

m— — I 

The maps T2 and T3 are each observed to have fluctua- 
tions that lie predominantly in a single plane, and more- 
over the planes associated with these two multipoles seem 
to be aligned [§4Tl| . The idea of the maximum-angular- 
momentum statistic is to quantify that alignment by 
defining for each Z an axis perpendicular to the plane 
picked put by the map Tj. 

Consider a particular map T;. For any given direction, 
specified by a unit vector h, we imagine rotating the map 
to bring h to the z axis. Let af m represent the spherical 
harmonics in the rotated coordinate system, which can be 
efficiently computed by applying an appropriate Wigner 
D matrix to the unrotated a; m 's [331 ] . We compute the 
the "angular momentum" of the rotated map about the 
z axis: 

I 

Ll{n)= m2 l a ^| 2 - ( 2 °) 

rn— — I 

The direction h that maximizes L 2 Z is taken to be the axis 
hi for the given multipole. Note that because L 2 (h) = 
L 2 z {—h), the vector hi is only defined up to an overall 
sign. 

We use the statistic A = \h2 ■ ^3] to assess the de- 
gree to which the fluctuations in the quadrupole and oc- 
tupole are aligned. In any statistically isotropic model, 
we expect the directions hi to be indepent and uni- 
formly distributed over the unit sphere, which implies 
that A is uniformly distributed on the interval [0,1]. The 
value in the actual WMAP data is surprisingly large at 
Awmap = 0.985. 

For any given choice of parameters (ai,^), we can 
simulate a large number of maps and determine the prob- 
ability density function (pdf) of the statistic A. Specif- 
ically, we can estimate the average pdf in an interval of 
with SX around the value Awmap simply by finding the 
fraction of all simulations yielding values in the range 
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(a) (b) 




log 10 CTi log 10 cr max 

FIG. 1: (a) Values of the probability density function (pdf) for the multipole alignment statistic A, evaluated in an interval of 
width S\ — 0.02 about A = 0.985, from 10 4 simulations for each choice of parameter values. The different curves correspond 
to cr 2 = (0, e _1 ,l,e,e 2 ,e 3 ) = (0,0.37,1,2.7,-7.4,20) (from highest to lowest pdf at the right of the plot). (b)The Bayesian 
evidence for the anisotropic models. The solid curve is for the dipole-only model, the dashed curve is for the dipole-quadrupole 
model, and the dotted curve is for the quadrupole-only model. 



[Awmap — Awmap + \5\\. Figure shows the re- 
sulting pdfs, based on 10 simulations for each point in 
parameter space, with SX = 0.02. The Poisson noise in 
this estimation process is visible as ~ 7% scatter in the 
points in this plot. Histograms of the simulation results 
confirm that the pdfs are smooth over scales much larger 
than 8X, so interpreting the average pdf as the pdf at the 
given point is reasonable. 

Since the pdf under the null hypothesis is equal to 
1, this quantity can be interpreted as a Bayesian evi- 
dence ratio comparing the model with the given values 
of (a i, era ) to the null hypothesis. 

As Figure [TJi shows, for some choices of parameter, 
the evidence ratio exceeds 3. However, this overstates 
the evidence in favor of the broken-isotropy model. As 
described in Section Hill the correct procedure is to treat 
<7i , (72 as unknown parameters with a given prior distri- 
bution, and integrate over that prior to get the evidence. 
The integration is performed numerically, after interpo- 
lating between the likelihood estimates found for the var- 
ious values of (<7i, (72). 

Figure [TJd shows the result of this calculation. The 
quantity on the horizontal axis is the prior cutoff (T max of 
equation (jTTJ) or (fT5|) . Because each Bayesian evidence 
ratio is an integral over the likelihood function, the effect 
of Poisson noise due to the finite number of simulations 
is greatly reduced. 

In the dipole-quadrupole case (where both o\ , 02 are 
free parameters), the Bayesian evidence ratio has a max- 
imum value of ~ 2.4 at er max ~ 1. (Recall that, as 
noted in Section [Til <j = 1 corresponds to only 10-20% 
modulation.) The dipole-only model (in which only o\ 
varies) fares a bit better, with evidence ratio peaking 
at ~ 2.7. Even if we take this maximum value as the 
true evidence ratio, it is still only modest support for 
the broken-isotropy model. The quadrupole-only model 



shows no significant improvement at all over the stan- 
dard model (as we could have predicted from Figure QJi, 
in which all curves approach the standard-model value of 
1 for low o"i). 



B. Multipole vectors 

To test the robustness of this result, we can use a dif- 
ferent approach to quantify the multipole alignment. For 
each multipole /, the map Tj can be used to define I unit 
vectors, generally called "multipole vectors" Q- The 
multipole vectors for each I can be used to characterize 
the orientation of that multipole, and thus to character- 
ize the quadrupole-octupole alignment. 

There are multiple different ways of using the multi- 
pole vectors to define an alignment statistic. The original 
work on the subject Q used an elaborate procedure in- 
volving the assessment of several different combinations 
of dot and cross products of the multipole vectors. Subse- 
quent work by members of the same group [lj| focused on 
a smaller subset of these possibilities. We have chosen to 
implement the "robust and more conservative" statistic 
used in the latter work. We now describe this statistic. 

Let v">i> (1 < j < I) represent the jth multipole vec- 
tor for multipole For any given /, we consider all 
1(1 — l)/2 distinct cross products of multipole vectors 
= x vV'ti (1 < % < j < I). Alignment of 

the quadrupolc and octupole planes can be character- 
ized by the absolute values of the dot product of the one 
quadrupole cross product, w^- 2 ' 1 ' 2 ^ , with each of the three 
octupole cross products w^ 3 ' 1 '^. (The absolute value is 
necessary because the multipole vectors, and hence the 
cross products, are specified only up to an overall sign.) 
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(a) (b) 




log 10 CTi log 10 cr max 

FIG. 2: (a) Probability density for the Schwarz et al. [I0j] multipole vector statistic S, evaluated at the value found in 
the WMAP data. From top to to bottom at log 10 <J\ = 0, the curves correspond to 02 = (0, 0.25, 0.50, 1., 2.0, 4.0, 16). (b) 
Bayesian evidence for anisotropic models. Results are plotted for the dipole-quadrupole (solid), dipole-only (dashed), and 
quadrupole-only (dotted) models. 



Following ref. [lCf, we therefore define a statistic 

S = [W^ 1 ^ ■W^V 1 + ^(2.^2) .^(3,1,3) 1 + ^(2,1,2) .^(3,2,3) | 

(21) 

The value of this statistic for the WMAP data is 
Swmap = 2.233. Based on Monte Carlo simulations, we 
find this to be inconsistent with the standard isotropic 
model at 99.3% confidence. These values differ only 
slightly from those found in ref. '10] (Swmap = 2.396, 
ruled out at 99.87% confidence), which used an earlier 
data release and a different foreground removal process 

Figure [2] shows the result of Bayesian evidence calcu- 
lations based on this statistic. At each point (o+o^) 
in parameter space, the pdf (i.e., the likelihood func- 
tion) was evaluated from 10 simulations, by counting 
the number of times the statistic was found in an in- 
terval of width SS — 0.15 about the value found in the 
WMAP data. The results are qualitatively consistent 
with those based on the angular momentum statistic, al- 
though with slightly higher evidence ratios (i.e., slightly 
more favorable to the broken-isotropy models). 

There is of course some arbitrariness in the choice of 
the statistic S. In addition to S, we devised an alterna- 
tive set of statistics based on the multipole vectors. The 
results based on these statistics can be viewed as a test 
of the robustness of the results above. 

In defining our statistics we were guided by a desire 
to characterize the observed dipole-quadrupole alignment 
and the fact that the octupole has been characterized as 
unusually planar. Since we were guided by these already- 
observed facts, of course, our choices are subject to the 
same a-posteriori-statistics criticism as most other work 
in this area. We did, however, attempt to avoid exacer- 
bating this problem with further a posteriori choices: we 
devised our statistics blindly and used only one statistic 
to characterize each of these two phenomena. 

The two multipole vectors at I = 2 define a plane, and 



we let fi2 be the unit vector perpendicular to that plane. 
To assess the multipole alignment, we need to define a 
similar unit vector based on the three I = 3 vectors. We 
define to be the unit vector that is as nearly as pos- 
sible perpendicular to these vectors by minimizing the 
quantity 

p = (n 3 -v^) 2 . (22) 

l<i<J<3 

As in the angular momentum case, we define an align- 
ment statistic to be the absolute value of the dot product 
of these vectors: 

A=\n 2 -n 3 \. (23) 

In addition, the statistic p can be thought of as char- 
acterizing the octupole planarity, with low values of p 
corresponding to more planar octupole patterns. 

The value of A for the real data is 0.97, which is some- 
what anomalously high since a uniform distribution on 
[0,1] is expected in the standard model. The statistic p, 
on the other hand, does not show anomalous planarity: 
its value in the real data is 0.31, lying near the middle 
of the distribution in simulations based on the standard 
model. 

Since p is quite consistent with the standard model, we 
would not expect its inclusion in our analysis to improve 
the evidence for any nonstandard models. For complete- 
ness, we performed the Bayesian evidence calculations 
using the joint probability density on A and p as our in- 
put likelihood function, and also using the probablility 
densities on A and p separately. 

The probability densities for each parameter were cal- 
culated as with the previous statistics, by counting the 
number of simulations yielding values in a small interval 
about the value in the true data. In this case, we used 10 5 
simulations at each data point, with 5 A = Sp = 0.005. 
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FIG. 3: Bayesian evidence ratios calculated using multipole vectors. In (a), the joint probability density for (A,p), the alignment 
and planarity statistics was used. In (b), only the statistic A was used. Results are plotted for the dipole-quadrupole (solid), 
dipole-only (dashed), and quadrupole-only (dotted) models. 



The joint probability density was estimated as the prod- 
uct of the individual pdfs. In principle, the two statistics 
could be correlated, in which case this would not be cor- 
rect. In practice, however, correlations were found to be 
negligible for the models under consideration; in spot- 
checks this approximation was found to be quite good. 

Figure [3] shows the result of Bayesian evidence compu- 
tations based on this statistic. As expected, the results 
vary only slightly depending on whether the planarity 
statistic is included. In either case, the strongest evi- 
dence ratio comes at tT ma x ~ 1 in the dipole-only model, 
but as before the evidence ratios are modest, peaking at 
^2.7 including both statistics and ^2.5 using only the 
alignment statistic. Results showing the planarity-only 
statistic are not shown but yield no significant enhance- 
ment in the evidence ratio. 

The strong similarity in all of the evidence ratio plots 
suggests that our results are insensitive to the precise 
way that the multipole alignment is characterized. 



V. CORRECTIONS TO COSMOLOGICAL 
PARAMETERS IN ANISOTROPIC MODELS 

In the anisotropic models under consideration, the 
power spectrum Ci is modified by the modulation func- 
tion /. We assume that the original, unmodulated power 
spectrum Cj , as opposed to the measured power spec- 
trum Ci , is produced by the usual standard- model mech- 
anism. In the anisotropic models, therefore, the cosmo- 
logical parameters estimated from the power spectrum 
will differ from those in the standard model. In this sec- 
tion we estimate the changes in parameters as functions 
of o i and a 2 - To be specific, we will assume that C\ has 
been estimated from the data and used to derive power 
spectrum estimates under the standard assumptions of 
isotropy and Gaussianity. We will compute the correc- 
tions that must be applied to these parameter estimates 



for nonzero <j\ , a 2 ■ We find that for reasonable values 
of (7 r , (72 ? all parameters except the overall normalization 
undergo very small changes. 

The changes in parameter values we compute depend 
on the assumption that the modulation is the same across 
all angular scales. If the modulation exists only on large 
scales, with smaller scales described by the unmodulated 
standard model, then the changes in parameter estimates 
will be even smaller than those found here. 

To estimate the changes in parameter values, we as- 
sume that the unmodulated CMB power spectrum is 
given by the standard model and can be calculated from, 
e.g., CMBFAST [Hj]. We begin by deriving the relation- 
ship between the modulated (i.e., observed) power spec- 
trum C\ , the unmodulated power spectrum , and the 

power spectrum Cf of the modulating function. We be- 
gin from equation 

a lm — y ] ^ ] a i imi fl2m 2 IUil2mm 1 m2i (24) 
h,mi l 2 ,m 2 

where Iu 1 i2mm 1 m2 is defined in equation (fTTj) . 

In an isotropic model, the power spectrum is given 
by Ci = (|a; m | 2 ), which is independent of m. In an 
anisotropic model, this quantity is not necessarily inde- 
pendent of to, so we define the power spectrum to be the 
average over m: 

^^TiE^™! 2 )- (25) 

rn 

We substitute equation ([24| into this expression. We 
then make use of the fact that both the and /; m 
coefficients are drawn from isotropic Gaussian random 
processes, which implies that different coefficients are un- 
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FIG. 4: Power spectrum changes. The solid curves are the <5C;'s resulting from nonzero o\ or 02 as described in equation 
(|32|l . The dot-dashed curves are the results of the linear approximation (|33p . with best-fit Sgi. The dashed curves are are 
the (SCj's resulting from recomputing the power spectrum with the modified parameter values. Power spectra are computed in 
dimensionless (AT/T) 2 form. In these units, the CMB power is of order 10 10 l(l + 1)Cz/2tt ~ 1.5 - 8 over the multipole range of 
interest; thus in the top panels, the deviations in the power spectrum are all quite small. The different approximations agree 
reasonably well, showing that the approximations in this section are adequate. 



correlated: 



sum above becomes three single sums: 



(a {0) a {0) *) 

\ a lm a l'm'l 
(flmfl'm') 
( a \mfl'm') 



— ^ $W &mm' ; 

= 0. 



(26) 
(27) 
(28) 



The result is 



C, 



Z^°'i U h I 21 + 1 2^, 1 hhlm 1 

h,h \ mi, 77i2, m 




°i — / y U U U 1 i 1 oi + 2^ h 1 hit 
h h 



(31) 



Because of the triangle inequality on the 3-j symbols, the 
first sum contains only one nonzero term (Zi = Z), and 
not surprisingly this term reduces to C^ \ The second 
and third sums similarly have only a few nonzero terms. 

Substituting C[ S) = al/2 and C ( 2 S) = o|/6, we find 
that the difference between modulated and unmodulated 
power spectra is 



5 Cl = Cl -cr=^Y, c h 



r + -± 



The sum inside the parentheses is over all m, mi, and 
mi values that make the Wigner 3-j symbols physical. 



y c, (o) 7f 2 , 

h=2 



(32) 

We see that 5Ci is a linear function of of and erf. For 
We assume that C} T> — for l 2 > 2, so that the double any given model, we can calculate the SCi contributions 
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from a\ and a\ independently. 

Assuming that the perturbation from the applied field 
is small, we expect the change in the power spectrum, and 
hence the change in the inferred parameter values, to be 
small. In the standard ACDM paradigm, the observed 
power spectrum depends on six parameters: flf, (baryon 
density), n c( j m (dark matter density), J7a (vacuum en- 
ergy density), n (spectral index), Hq (Hubble constant, 
h = i7o/(100 kms~' Mpc)), and A (normalization con- 
stant for all Ci, relative to the current best fit values 
from WMAP). Calling these parameters <?i,...,<?6> we 
have to a good approximation 



(33) 



Setting equations (1321) and (|33| equal, and splitting the 
parameter variations into terms that depend on o\ and 
(72 , we can write 



rameters can be calculated by 



/ ssi b 


\ 


^cdm 






Sn 




Sh 




V SA 


) 



3.86 x 10~ 4 
-6.33 x 10~ 3 
5.24 x 10^3 
-8.39 x 1(T 3 
4.74 x 1(T 3 
V -0.285 



0.963 x 10~ 4 \ 
-5.78 x 10-3 
4.26 x 10~ 3 
-7.88 x 10~ 3 
6.55 x 10~ 3 
-0.753 / 

In all cases except for the overall normalization A, 
the parameter changes are small even for relatively large 
o\ 1 a\ ~ 1. Moreover, as can be seen in Figure 01 the 
residuals SCi have similar shape to the input power spec- 
trum Ci (although with a negative prefactor for a%), in- 
dicating that the chief error in the linear approximations 
in this section applies to the normalization. We conclude 
that, in a model of the form considered here, one should 
take care to recompute the overall normalization, which 
affects the normalization of the matter power spectrum, 
but that other parameters are likely to remain approxi- 
mately unchanged. 

VI. FOREGROUNDS 



6 ox-* 2 00 

E^g^E^C, (34) 

i=l y h=2 

E^ig - ^£4°^ (35) 

% = Sg lt(T 2 + 5g la 2 (36) 

We use Euler's method to approximate 
^pf-, starting from the current best fit val- 

ues «?(°) = (^,^l,^,n(°),C^ (0) ) = 
(0.046,0.224,0.73,0.99,0.72,1). We vary each pa- 
rameter independently by about 2% of the original 
value, calculate the resulting Ci's with CMBFAST, and 
obtain dCi by calculating difference between the new 
C/'s and the standard Ci's. We thus obtain Using 
equation (|32p and starting with the standard-model pa- 
rameter values, we compute the a\ and a\ contributions 
to 5Ci. 

We can then find best-fit values of Sg i a i . We per- 
form a least-squares fit over the range 2 < I < 600, with 
weights given by the combination of cosmic variance and 
noise errors for WMAP. To test the validity of this pro- 
cedure, we compute a new set of Ci's using CMBFAST 
with parameters given by g {0) + 5g. Fi gure|4] shows that 
the fitting works very well, and that the linearity of the 
Ci 's in the specific direction of 8g validates the approxi- 
mation in equation ((331) - For cri,<72 of order 1, linearity 
starts to break down, but such large values are probably 
unphysical in any case. 

Numerically, in the linear regime the changes in pa- 



The significance of the observed anomalies depends on 
the choice of data set (e.g., [25[). We chose to work in 
spherical harmonic space, leading to the requirement of 
an all-sky data set. We thus worked with the WMAP 
ILC data. With this choice of data set, one must wonder 
about the effect of residual foreground contamination on 
our results. 

We can begin to assess these effects using a set of 
10 000 publicly available ILC simulations 24]. For each 
simulation, both the foreground-free input map and the 
ILC reconstructions, with residual foreground contami- 
nation, are provided. In each case, we computed the four 
statistics discussed in this paper, namely the angular- 
momentum statistic A, the Schwarz et al. multipole vec- 
tor statistic S, the multipole vector alignment statistic 
A, and the planarity statistic p. Figure [S] shows a com- 
parison of the input and and ILC maps for each statistic. 
In each case, there is a strong correlation, but the scatter 
is considerable. 

The probability density for each of the four statistics 
undergoes no significant change between the input and 
ILC ensembles. (In other words, in each of the four plots 
on the top row of Figure [5j histograms of the x and y val- 
ues look essentially identical.) This can be quantified in a 
variety of ways. Since we are most interested in the prob- 
ability distribution near the upper tail of the distribution 
of each statistic (except for p, which has negligible effect 
on any of our conclusions anyway), we look at the behav- 
ior of the distributions near the 99th percentile. For each 
statistic, we find the 99th-percentile value in the 10 000 
ILC maps, and count the number of input maps lying 
above that value. (The results are essentially identical if 
the two roles are reversed.) If the input and ILC prob- 
ability densities are the same, we expect to find 100 in 
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FIG. 5: Effect of foreground contamination. The top row shows scatter plots indicating the relation between statistics derived 
from simulated 24] input (foreground-free) maps and ILC reconstructions that include residual foreground contamination. The 
bottom row shows only those simulations for which the ILC statistics lie in the top 1% of their distributions. (The planarity 
statistic p is not shown in the bottom row, as it does not yield an anomalously high value in the actual data.) 



each case. The actual values found deviate from this ex- 
pected value by 1, 1, 5, —6 for A, S, A,p respectively. All 
are well within the 10% fluctuation level expected due to 
Poisson noise. 

From this test, we can conclude that foregrounds 
do not significantly alter the statistical significance of 
anomalies based on these statistics. Due to the problem 
of a posteriori statistics, reasonable people can disagree 
about whether to take the ILC multipole alignment se- 
riously, but one's opinion on this question need not be 
altered by consideration of foreground contamination. 

In this paper we do not chiefly address the question 
of whether the multipole alignment is statistically signif- 
icant; on the contrary, we provisionally adopt the stance 
that it is and ask what form an explanation of it might 
take. For this sort of question, we need to go beyond the 
simple considerations above and consider the correlations 
between input and ILC maps. After all, nonstandard cos- 
mological models such as the broken-isotropy models we 
consider affect the probability of seeing multipole align- 
ments in the foreground- free ("input" ) maps, whereas the 
likelihoods that form the basis of our evidence calcula- 
tions are based on the ILC map. 

Once again, for the three statistics A, S, A that pri- 
marily affect our results, we are interested in the relation 
between input and ILC values near the upper end of the 
statistics' ranges. Specifically, we want to know whether 
the observed large ILC value implies a large input value 
in the foreground- free CMB. The bottom row of plots in 
Figure [5] provides one qualitative way of addressing this 
question. For each statistic, we show a scatter plot com- 
paring input and ILC values as in the upper row, but 
showing only points corresponding to the top 1% of ILC 
values. Many points cluster near the right, indicating 
that a high ILC value is likely, but by no means certain, 
to have come from a high input value. 

Let us be slightly more quantitative. For any given 
statistic, say A, we extract the realizations for which 
the ILC maps are anomalously high, lying in the top 



1% of the distribution. For these 100 realizations, we 
find the value of the statistic in the input map, AJ nput , 
and look at its ranking in the full set of 10000 in- 
put realizations. This gives the cumulative probability 
-Rnput = Pr[Ain P ut < Aj* nput ] for each of the 100 input 
maps. If foreground contamination were negligible, then 
these 100 maps would lie in the top percentile of the in- 
put distribution, i.e., all 100 Pi npu t values would be above 
0.99. 

Figure [5] shows the result of this exercise for each of 
the three statistics A, S, A. In each case, the results are 
sorted by the value of the statistic in the input data. 
The results show that the statistic S is least affected by 
foreground contamination: if a realization lies in the top 
1% of the ILC maps, there is a high probability that it 
also lies near the top of the probability distribution of 
the input maps as well. For the three statistics A, S, A, 
the median values of Pi npu t for the ILC top 1% maps are 
93.2%, 98.5%, 89.0%, as compared to the value 99.5% 
that would occur if there were no foregrounds. 

Generically, if the correlation between input and ILC 
maps is weak, then we would expect the enhanced like- 
lihood and Bayesian evidence results of Section IIVI to 
be overestimates of the correct results. Intuitively, this 
seems clear: if the connection between the true CMB and 
the observed ILC data is weak, then so is our ability to 
draw cosmological conclusions from the ILC data. We 
can express this idea more formally as follows. Our theo- 
retical models allow us to calculate probability distribu- 
tions for the "input" data (i.e., the pure CMB), while our 
observations are of the ILC data. The correct procedure, 
therefore, is to convert the input probability distributions 
into ILC probability distributions by convolution with a 
conditional probability function P(ILC|Input). Such a 
convolution would smooth out variations in likelihood. 

We conclude, therefore, that because of foreground 
contamination, the results shown in Section IIVI should 
be regarded as upper limits. The effect of foregrounds on 
the results is difficult to quantify, but based on Figure [5] 



11 




40 60 
Rank 



80 100 



FIG. 6: For the three statistics A (solid), S (dashed), A (dot- 
ted), we select the top 1% of ILC simulations, and determine 
the cumulative probability Pi npu t of the statistic in the input 
map. The values are sorted and plotted. In the absence of 
any effect from foregrounds, ILC and input maps would be 
identical, and the result would be a straight line extending 
from 0.99 to 1 (dot-dashed line). 



we expect it to be smallest for the results based on the 
Schwarz et al. statistic S. 



VII. DISCUSSION 

The various anomalies that have been noted in the 
large-angle CMB may provide hints of departures from 
the standard cosmological model, possibly including vi- 
olations of statistical isotropy. Although the statistical 
significance of these anomalies is difficult or even impossi- 
ble to quantify a posteriori, these possibilities are exciting 
enough to warrant closer examination. 

We have considered several classes of physically- 
motivated models that might explain the anomalies. We 
have calculated Bayesian evidence ratios to assess the de- 
gree to which the purported anomalies in the multipoles 
I — 2,3 favor the anisotropic models over the standard 
model. 

According to the pioneering work of Jeffreys [35| . a 
Bayesian evidence ratio constitutes "substantial" evi- 
dence if In A > 1 and "strong" evidence if In A > 2.5. 
As the results in the Section IIVI make clear (note that 
what is plotted in each case is A, not In A), only for the 
most judicious choice of prior do the tests performed here 
reach the "substantial" level, and they never come close 
to being "strong." 

Of course, Jeffreys's criteria are somewhat arbitrary, 
but in this case they seem to describe the situation fairly 
well. Recall that the evidence ratio A is simply the factor 
by which the ratio of prior probabilities must be adjusted, 
in the light of the observations, in order to get the poste- 
rior probability ratio. Presumably, the prior probability 
distribution assigns very low weight to the less natural 
anisotropic models, so even after applying an evidence 
ratio A ~ 3, the anisotropic models are still considered 



unlikely. One would require an exponentially large evi- 
dence ratio before assigning significant probability to the 
anisotropic models. 

We used several different statistical approaches to to 
characterize the observed multipole alignment. Some 
(A, S) are adopted from previous work, while others 
(A,p) are of our own devising. In the latter case, we at- 
tempted to minimize (although not eliminate) the prob- 
lem of a posteriori statistics by choosing a method blindly 
that seemed to us to naturally encapsulate the observed 
phenomena with minimal arbitrary choices. In any case, 
the general consistency of the results based on the dif- 
ferent statistics indicates that the approach we have fol- 
lowed is robust. 

We have estimated the changes in cosmological param- 
eter estimates that would arise if the anisotropic models 
were shown to be correct. The chief effect of the modu- 
lation is on the estimate of the overall power spectrum 
normalization, which would of course have consequences 
for studies of large-scale structure. Our calculations are 
valid only if the modulation is applied to the CMB at 
all Z-values measured by WMAP. If a more complicated 
model is correct (e.g. [361] ). in which only some scales are 
modulated, then the parameter changes would presum- 
ably be smaller. 

We have used simulations of the ILC mapmaking pro- 
cess to evaluate the degree to which foreground contam- 
ination might affect our results. The statistic S appears 
least affected by this problem: ILC maps with high val- 
ues of S are very likely to correspond to high values of 
S in the intrinsic CMB. A thorough treatment of fore- 
grounds in our analysis would generically reduce the (al- 
ready modest) enhancements in the evidence ratio, so due 
to the effects of foregrounds our results can be regarded 
as upper limits. 

In this paper, we have tentatively adopted the point 
of view that there are anomalies to be explained. Of 
course, one would greatly prefer to settle this question in 
a way that was not plagued by the problem of a poste- 
riori statistics. To do this, we would require a new data 
set that probes similar scales to the large-angle CMB. 
All-sky polarization maps may provide some insight into 
these issues [13, 38] . Another possibility is to survey the 
"remote quadrupole" signal found in the polarization of 
CMB photons scattered in distant clusters [39| . which 
can be used to reconstruct information on gigaparsec- 
scale perturbations [40|, |4l| . Although gathering data on 
these scales is a difficult task, the potential for learning 
about the structure of the Universe on the largest ob- 
servable scales makes it worth pursuing. 
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